F2N-Rank: Domain Keywords Extraction Algorithm

نویسندگان

  • Zhijuan Wang
  • Yinghui Feng
چکیده

Domain keywords extraction is very important for information extraction, information retrieval, classification, clustering, topic detection and tracking, and so on. TextRank is a common graph-based algorithm for keywords extraction. For TextRank, only edge weights are taken into account. We proposed a new text ranking formula that takes into account both edge and node weights, named F2N-Rank. Experiments show that F2N-Rank clearly outperformed both TextRank and ATF*DF. F2N-Rank has the highest average precision (78.6%), about 16% over TextRank and 29% over ATF*DF in keywords extraction of Tibetan religion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge

Keyword extraction of scientific articles is beneficial for retrieving scientific articles of a certain topic and grasping the trend of academic development. For the task of keyword extraction for Chinese scientific articles, we adopt the framework of selecting keyword candidates by Document Frequency Accessor Variety(DF-AV) and running TextRank algorithm on a phrase network. To improve domain ...

متن کامل

Passage Retrieval for Information Extraction using Distant Supervision

In this paper, we propose a keyword-based passage retrieval algorithm for information extraction, trained by distant supervision. Our goal is to be able to extract attributes of people and organizations more quickly and accurately by first ranking all the potentially relevant passages according to their likelihood of containing the answer and then performing a traditional deeper, slower analysi...

متن کامل

Optimizing information retrieval in question answering using syntactic annotation

One of the bottle-necks in open-domain question answering (QA) systems is the performance of the information retrieval (IR) component. In QA, IR is used to reduce the search space for answer extraction modules and therefore its performance is crucial for the success of the overall system. However, natural language questions are different to sets of keywords used in traditional IR. In this study...

متن کامل

Log based Keyword Extraction and Spread based Clustering for an Efficient Information Searching

Today an efficient information search is very important to extract and analyze user requirements in vast amount of web information. Due to this reason, this paper proposes the log based keyword extraction method which finds the associated keywords in a certain domain. Also, this paper proposes the spread based clustering method as clustering the keywords with high association among the keyword-...

متن کامل

Source Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection

This paper regards the query keywords selection problem in source retrieval as learning a ranking model to choose the method of keywords extraction over suspicious document segments. Four basic methods are used in our ranking function: BM25, TFIDF, TF and EW. Then, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015